Distributed Graph Clustering and Sparsification

نویسندگان

  • He Sun
  • Luca Zanetti
چکیده

Graph clustering is a fundamental computational problem with a number of applications in algorithm design, machine learning, data mining, and analysis of social networks. Over the past decades, researchers have proposed a number of algorithmic design methods for graph clustering. Most of these methods, however, are based on complicated spectral techniques or convex optimisation, and cannot be directly applied for clustering many networks that occur in practice, whose information is often collected on different sites. Designing a simple and distributed clustering algorithm is of great interest, and has wide applications for processing big datasets. In this paper we present a simple and distributed algorithm for graph clustering: for a wide class of graphs that are characterised by a strong cluster-structure, our algorithm finishes in a poly-logarithmic number of rounds, and recovers a partition of the graph close to optimal. One of the main components behind our algorithm is a sampling scheme that, given a dense graph as input, produces a sparse subgraph that provably preserves the clusterstructure of the input. Compared with previous sparsification algorithms that require Laplacian solvers or involve combinatorial constructions, this component is easy to implement in a distributed way and runs fast in practice.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Uncertain Graph Sparsification

Uncertain graphs are prevalent in several applications including communications systems, biological databases and social networks. The ever increasing size of the underlying data renders both graph storage and query processing extremely expensive. Sparsification has often been used to reduce the size of deterministic graphs by maintaining only the important edges. However, adaptation of determi...

متن کامل

Towards Scalable Spectral Clustering via Spectrum-Preserving Sparsification

The eigendeomposition of nearest-neighbor (NN) graph Laplacian matrices is the main computational bottleneck in spectral clustering. In this work, we introduce a highly-scalable, spectrum-preserving graph sparsification algorithm that enables to build ultra-sparse NN (u-NN) graphs with guaranteed preservation of the original graph spectrums, such as the first few eigenvectors of the original gr...

متن کامل

Lecture 04 / 13 : Graph Sampling and Sparsification

In previous lectures, we discussed various topics on graph: sparsest cut, clustering which finds small k-cuts, max flow, multicommodity flow, etc. For very large graphs, we’d like to ask “instead of regular approximation algorithms which run in polynomial time, can we find any near-linear time algorithm for minimum cut or maximum flow problems?”. A simple idea is for a given graph G, to solve t...

متن کامل

Single pass graph sparsification in distributed stream processing

We give a distributed one pass streaming algorithm for graph sparsification. Besides producing a sparsifier, our algorithm maintains a hierarchy of UNION-FIND data structures in a distributed manner that efficiently support queries of strong connectivities between pairs of vertices. An important component of the algorithm is an implementation of UNION-FIND queries over an Active Distributed Has...

متن کامل

Distributed-Memory Parallel Algorithms for Counting and Listing Triangles in Big Graphs

Big graphs (networks) arising in numerous application areas pose significant challenges for graph analysts as these graphs grow to billions of nodes and edges and are prohibitively large to fit in the main memory. Finding the number of triangles in a graph is an important problem in the mining and analysis of graphs. In this paper, we present two efficient MPI-based distributed memory parallel ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1711.01262  شماره 

صفحات  -

تاریخ انتشار 2017